library(tidyverse)
library(lubridate)
library(scales)
library(pander)
library(plotly)
panderOptions("big.mark", ",")
panderOptions("table.split.table", Inf)
panderOptions("table.style", "rmarkdown")
theme_set(theme_bw())
Disclaimer: This very simple report was prepared by a bioinformatician with no experience in epidemiology or virology, and as such should be treated simply as an alternate viewpoint on the data, which I was simply unable to find elsewhere. Many other people exist with much greater expertise on this subject. However, I do hope this provides a useful perspective which is able to add constructively to the wider discussion.
Data was sourced from Johns Hopkins University (https://coronavirus.jhu.edu/), using the datasets provided by JHU at https://github.com/CSSEGISandData/COVID-19. JHU data is updated every 24 hours at approximately 23:59(UTC), which is about 10:30AM in Adelaide. As a result Australian numbers may lag local media reports.
Additionally, the official government figures at www.health.gov.au appear to lag media reports, likely to ensure accuracy of the official numbers. The official Australian numbers are not used for this analysis, instead relying on those provided by JHU. However, JHU are extremely careful and comprehensive in their sourcing of numbers and significant disparity is not expected.
confirmed <- url("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv") %>%
read_csv() %>%
pivot_longer(
cols = ends_with("20"),
names_to = "date",
values_to = "confirmed"
) %>%
mutate(
date = str_replace_all(
date, "(.+)/(.+)/(.+)", "20\\3-\\1-\\2"
) %>%
ymd()
) %>%
dplyr::rename(
Country = `Country/Region`
) %>%
dplyr::mutate(
Country = case_when(
grepl("China", Country) & `Province/State` == "Hubei" ~ "China (Hubei)",
grepl("China", Country) & `Province/State` != "Hubei" ~ "China (Other)",
!grepl("China", Country) ~ Country
)
) %>%
dplyr::filter(
!is.na(confirmed)
)
Population sizes for the most impacted countries were manually obtained from https://www.worldometers.info/ and should not be considered as authoritative. Given the disparity of infection within China, China was broken into Hubei Province and the rest of China. As an open acknowledgement of the crudeness of population values, population estimates for Hubei Province were taken from the 2018 estimates given by Statista.com. This particular population estimate is likely to be low, and as a result confirmed case rates for Hubei province may be an overestimate, and consequently, confirmed case rates for the rest of China may also be an underestimate.
Confirmed cases of COVID-19 as provided by the Chinese Government have been discussed elsewhere as unusual, and data appears potentially unreliable. In this analysis, discussions regarding accurate Chinese reporting are not considered further and data is simply presented as supplied by JHU.
However, all countries are likely to contain many unreported cases given the incomplete testing regimes in place for literally every country. Similarly, reporting in many countries may have features that cause concerns regarding data integrity and this makes comparison across countries difficult.
hb <- 59170000
pops <- tribble(
~Country, ~Population,
"China (Hubei)", hb,
"China (Other)", 1408526449 - hb,
"Italy", 60486925,
"Iran", 83677594,
"Spain", 46754778,
"Germany", 83783942,
"Korea, South", 51269185,
"France", 65273511,
"US", 331002651,
"Switzerland", 8654622,
"United Kingdom", 67886011,
"Netherlands", 17134872,
"Norway", 5408930,
"Austria", 9006398,
"Belgium", 11575214,
"Sweden", 10081035,
"Denmark", 5786274,
"Japan", 126476461,
"Malaysia", 32245488,
"Canada", 37742154,
"Australia", 25499884,
"Portugal", 10196709,
"Qatar", 2866531,
"Czechia", 10703010,
"Greece", 10423054
)
confirmed %>%
group_by(Country, date) %>%
summarise(confirmed = sum(confirmed)) %>%
ungroup() %>%
group_by(Country) %>%
dplyr::filter(
date == max(date)
) %>%
ungroup() %>%
right_join(pops) %>%
mutate(
rate = 1e6*confirmed / Population
) %>%
arrange(desc(rate)) %>%
# dplyr::slice(1:20) %>%
mutate(
rate = sprintf("%.1f", rate),
Population = comma(Population)
) %>%
rename_at(vars(everything()), str_to_title) %>%
rename(`Rate (Cases per Million)` = Rate) %>%
pander(
justify = "lrrrr",
caption = paste(
"The", nrow(.), "most impacted populations studied here and shown as a proportion of total population.",
"The final column provides the latest confirmed infection rate as cases per million people."
)
)
| Country | Date | Confirmed | Population | Rate (Cases per Million) |
|---|---|---|---|---|
| China (Hubei) | 2020-03-18 | 67,800 | 59,170,000 | 1145.9 |
| Italy | 2020-03-18 | 35,713 | 60,486,925 | 590.4 |
| Switzerland | 2020-03-18 | 3,028 | 8,654,622 | 349.9 |
| Spain | 2020-03-18 | 13,910 | 46,754,778 | 297.5 |
| Norway | 2020-03-18 | 1,550 | 5,408,930 | 286.6 |
| Iran | 2020-03-18 | 17,361 | 83,677,594 | 207.5 |
| Denmark | 2020-03-18 | 1,115 | 5,786,274 | 192.7 |
| Austria | 2020-03-18 | 1,646 | 9,006,398 | 182.8 |
| Korea, South | 2020-03-18 | 8,413 | 51,269,185 | 164.1 |
| Qatar | 2020-03-18 | 452 | 2,866,531 | 157.7 |
| Germany | 2020-03-18 | 12,327 | 83,783,942 | 147.1 |
| France | 2020-03-18 | 9,105 | 65,273,511 | 139.5 |
| Belgium | 2020-03-18 | 1,486 | 11,575,214 | 128.4 |
| Sweden | 2020-03-18 | 1,279 | 10,081,035 | 126.9 |
| Netherlands | 2020-03-18 | 2,058 | 17,134,872 | 120.1 |
| Portugal | 2020-03-18 | 448 | 10,196,709 | 43.9 |
| Czechia | 2020-03-18 | 464 | 10,703,010 | 43.4 |
| Greece | 2020-03-18 | 418 | 10,423,054 | 40.1 |
| United Kingdom | 2020-03-18 | 2,642 | 67,886,011 | 38.9 |
| Malaysia | 2020-03-18 | 790 | 32,245,488 | 24.5 |
| US | 2020-03-18 | 7,783 | 331,002,651 | 23.5 |
| Australia | 2020-03-18 | 568 | 25,499,884 | 22.3 |
| Canada | 2020-03-18 | 657 | 37,742,154 | 17.4 |
| China (Other) | 2020-03-18 | 13,302 | 1,349,356,449 | 9.9 |
| Japan | 2020-03-18 | 889 | 126,476,461 | 7.0 |
minToInclude <- 5
startingPoint <- 1
p <- confirmed %>%
group_by(Country, date) %>%
summarise(confirmed = sum(confirmed)) %>%
ungroup() %>%
right_join(pops) %>%
mutate(
rate = 1e6*confirmed / Population
) %>%
group_by(Country) %>%
dplyr::filter(
max(rate) > minToInclude
) %>%
ungroup() %>%
split(f = .$Country) %>%
lapply(function(x, r = startingPoint){
std <- min(dplyr::filter(x, rate > r)$date)
x %>%
dplyr::filter(date >= std) %>%
mutate(days = date - std)
}) %>%
bind_rows() %>%
mutate(
days = as.integer(days),
rate = round(rate, 2)
) %>%
dplyr::filter(days <= 30) %>%
rename_all(str_to_title) %>%
ggplot(
aes(Days, Rate, colour = Country, Date = Date, Confirmed = Confirmed)
) +
geom_point() +
geom_line() +
scale_y_log10() +
xlab(
paste(
"Days since passing",
startingPoint,
"confirmed case/million"
)
) +
ylab("Confirmed Infection Rate (cases/million)")
ggplotly(p)
Confirmed cases of COVID-19 for countries where the infection rate has exceeded 5 cases/million. Data is only shown for the first 30 calendar days since passing 1 confirmed case/million. Note that from the day records begin in this dataset, the confirmed infection rate in Hubei was 7.59 confirmed cases/million. To hide a country, click on the country in the plot legend. Clicking again on the country in the legend will restore the data within the plot. Regions of the plot are also able to be zoomed interactively. Please note the y-axis is shown on the logarithmic scale, so that a series of points which appear to be diagonal will indicate exponential growth. The flatter the line, the slower the growth and a perfectly horizontal line would indicate zero growth, or no new confirmed cases.
All figures and tables presented here simply aim to show an alternative viewpoint on the data. Every way to view COVID-19 data will mask important features, and the values shown here do not take into account vital factors such as population density, variability of infection across regions within countries, social culture and demographics. Many countries may not be directly comparable for a combination of the above factors. It is simply to view the data through the lens of a country’s population size using a value which should be easily interpretable.
In the above plot:
Additionally, Australia’s spread of the virus appears slower than many other countries.
Turning to other countries beyond Australia:
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
locale: LC_CTYPE=en_AU.UTF-8, LC_NUMERIC=C, LC_TIME=en_AU.UTF-8, LC_COLLATE=en_AU.UTF-8, LC_MONETARY=en_AU.UTF-8, LC_MESSAGES=en_AU.UTF-8, LC_PAPER=en_AU.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_AU.UTF-8 and LC_IDENTIFICATION=C
attached base packages: stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: plotly(v.4.9.2), pander(v.0.6.3), scales(v.1.1.0), lubridate(v.1.7.4), forcats(v.0.4.0), stringr(v.1.4.0), dplyr(v.0.8.4), purrr(v.0.3.3), readr(v.1.3.1), tidyr(v.1.0.2), tibble(v.2.1.3), ggplot2(v.3.2.1) and tidyverse(v.1.3.0)
loaded via a namespace (and not attached): Rcpp(v.1.0.3), lattice(v.0.20-40), assertthat(v.0.2.1), digest(v.0.6.25), mime(v.0.9), R6(v.2.4.1), cellranger(v.1.1.0), backports(v.1.1.5), reprex(v.0.3.0), evaluate(v.0.14), httr(v.1.4.1), pillar(v.1.4.3), rlang(v.0.4.4), lazyeval(v.0.2.2), readxl(v.1.3.1), rstudioapi(v.0.11), data.table(v.1.12.8), rmarkdown(v.2.1), labeling(v.0.3), htmlwidgets(v.1.5.1), munsell(v.0.5.0), shiny(v.1.4.0), broom(v.0.5.4), compiler(v.3.6.3), httpuv(v.1.5.2), modelr(v.0.1.6), xfun(v.0.12), pkgconfig(v.2.0.3), htmltools(v.0.4.0), tidyselect(v.1.0.0), fansi(v.0.4.1), viridisLite(v.0.3.0), later(v.1.0.0), crayon(v.1.3.4), dbplyr(v.1.4.2), withr(v.2.1.2), grid(v.3.6.3), xtable(v.1.8-4), nlme(v.3.1-144), jsonlite(v.1.6.1), gtable(v.0.3.0), lifecycle(v.0.1.0), DBI(v.1.1.0), magrittr(v.1.5), cli(v.2.0.1), stringi(v.1.4.6), farver(v.2.0.3), promises(v.1.1.0), fs(v.1.3.1), xml2(v.1.2.2), generics(v.0.0.2), vctrs(v.0.2.3), tools(v.3.6.3), Cairo(v.1.5-10), glue(v.1.3.1), hms(v.0.5.3), crosstalk(v.1.0.0), fastmap(v.1.0.1), yaml(v.2.2.1), colorspace(v.1.4-1), rvest(v.0.3.5), knitr(v.1.28) and haven(v.2.2.0)